-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[libclc] Add v3 variants of async_work_group_copy/async_work_group_strided_copy/prefetch #137932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rided_copy/prefetch 3-component vector type is supported for them per OpenCL spec.
|
@frasercrmck please help to review, thanks |
frasercrmck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I see there's a note in the spec: async_work_group_copy and async_work_group_strided_copy for 3-component vector types behave as async_work_group_copy and async_work_group_strided_copy respectively for 4-component vector types. . I'm not sure what that really means for our implementation which does a loop and a store.
I also wonder why we have async/gentype.inc. With this change, couldn't we just use float/gentype.inc and integer/gentype.inc in succession?
OpenCL spec requires that 3-component vector type data has the same size as 4-component vector type data in memory layout.
done, thanks for the suggestions. Deleted |
frasercrmck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We could probably come up with some kind of combined gentype.inc that does them both at once (though both currently undef __CLC_BODY after they finish so we'd need to be able to stop that, or preserve __CLC_BODY across gentypes).
Come to think of it, we do a lot of unnecessary undef __CLC_BODY around the codebase. I will try to clean that up in a separate PR.
good idea. |
…rided_copy/prefetch (llvm#137932) 3-component vector type is supported for them per OpenCL spec.
…rided_copy/prefetch (llvm#137932) 3-component vector type is supported for them per OpenCL spec.
3-component vector type is supported for them per OpenCL spec.